The Comparison of Gini and Twoing Algorithms in Terms of Predictive Ability and Misclassification Cost in Data Mining: An Empirical Study
نویسندگان
چکیده
The classification tree is commonly used in data mining for investigating interaction among predictors, particularly. The splitting rule and the decision trees technique employ algorithms that are largely based on statistical and probability methods. Splitting procedure is the most important phase of classification tree training. The aim of this study is to compare Gini and Twoing splitting rules in terms of misclassification cost, obtained the optimal balanced trees and the importance of independent variables. This study shows that the results obtained using the Twoing criterion, as it yields a tree that is much more equally balanced than the tree obtained with the Gini criterion. Misclassification rate was slightly different for the two methods (19% using Twoing criterion and 21,2% for the Gini).Using Twoing splitting rule gets more importance level independent variables and the improvement values are higher than the Gini algorithm. All things being considered, the good performance of the Twoing splitting in this study combined with its robustness to get high classification accuracy, tree structure and the importance of independent variables.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملPredicting Bankruptcy of Companies using Data Mining Models and Comparing the Results with Z Altman Model
One of the issues helping make investment decisions is appropriate tools and models to evaluate financial situation 0f the organization. By means of these tools, investors can analyze financial situation of the organization and identify financial distress or an ideal condition, they become aware of making decisions to invest in appropriate conditions. The main objective of this study is to ev...
متن کاملComparison of the Efficiency of Data Mining Algorithms in Predicting the Diagnosis of Diabetes
Background: Diabetes is one of the major health problems in Iran and about 4.6 million adults suffer from this disease. Poor diagnosis of this disease has caused half of this number to be unaware of their disease. In recent years, along with the use of computers in data analysis and storage, the volume and complexity of data has increased dramatically. Methods: In health organizations, data pl...
متن کاملPerformance evaluation of gang saw using hybrid ANFIS-DE and hybrid ANFIS-PSO algorithms
One of the most significant and effective criteria in the process of cutting dimensional rocks using the gang saw is the maximum energy consumption rate of the machine, and its accurate prediction and estimation can help designers and owners of this industry to achieve an optimal and economic process. In the present research work, it is attempted to study and provide models for predicting the m...
متن کاملComparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk
Background and Objective: Colorectal cancer (CRC) is one of the most prevalent malignancies in the world. The early detection of CRC is not only a simple process, but it is also the key to its treatment. Given that data mining algorithms could be potentially useful in cancer prognosis, diagnosis, and treatment, the main focus of this study is to measure the performance of some data mining class...
متن کامل